Johns Hopkins University
Modeling Continuous Glucose Monitor (CGM) Data
A Bayesian Functional Model for CGM Data
Model Performance
Model Inference on CGM
Extending our approach
Estimates collated by the CDC using data from the National Heath and Nutrition Examination Survey.
13.5% (40.1 million) in 20231
$412.9 billion in expenditures in 20222
Increasing complications: kidney failure, stroke, heart failure
Example CGM device and readout from NIDDK.
Estimates blood glucose every 1-15 minutes, \(\leq 15\) days
Recommended for diabetes management1
System response to food, exercise, etc.
\(N = 89\) adults with type 2 diabetes
Randomized, crossover feeding study1
Meals from study center
DASH4D vs. Comparison
Wore blinded CGM
DASH4D reduces mean glucose, increases time in normal range2
How does DASH4D affect postprandial glucose response (PPGR)?
Example DASH4D-compliant meal from the study website.
\(65\) participants
\(65\) participants
\(768\) PPGR curves
Mixed effects model at each time \(t\)
\[PPGR^{(60)}_{ij} = \beta_0^{(60)} + DASH4D_{ij} \times \boxed{\boldsymbol{\beta_1^{(60)}}} + (\ldots) + U_i^{(60)} + \epsilon_{ij}^{(60)}\]
\[PPGR^{(120)}_{ij} = \beta_0^{(120)} + DASH4D_{ij} \times \boxed{\boldsymbol{\beta_1^{(120)}}} + (\ldots) + U_i^{(120)} + \epsilon_{ij}^{(120)}\]
\[PPGR^{(t)}_{ij} = \beta_0^{(t)} + DASH4D_{ij} \times \boxed{\boldsymbol{\beta_1^{(t)}}} + (\ldots) + U_i^{(t)} + \epsilon_{ij}^{(t)}\]
Modeling Continuous Glucose Monitor (CGM) Data
A Bayesian Functional Model for CGM Data
Model Performance
Model Inference on CGM
Extending our approach
Sources of Structure:
\[\boxed{PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \boldsymbol{\beta_1(t)} + (\ldots) + U_i(t) + \epsilon_{ij}(t)}\]
\(\{1, t, \ldots \}\): chosen functions
\(\{b_{i0}, b_{i1}, \ldots \}\): correlated RE
\(\phi_k(t)\): data-driven functional PCs
\(\xi_{ik}\): scores
\[PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots) + \sum_{k = 1}^K \xi_{ik} \phi_k(t) + \epsilon_{ij}(t)\]
Standard approach1:
Estimate \(\beta_p(t)\)
Estimate \(\widehat{\phi}_k(t)\) from residuals
Condition on splines, \(\widehat{\phi}_k(t)\)
Fit linear mixed effects model
What about uncertainty in \(\widehat{\phi}_k(t)\)?
Account for \(\widehat{\phi}_k(t)\) uncertainty?
Changed \(\beta_p(t)\) inference?
Challenges modeling FPCs:
Constrained to be orthogonal (Stiefel manifold)
Maintain smoothness
Literature:
“A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data” (Peng & Paul, 2009)
“Generalized Multilevel Function-on-Scalar Regression and Principal Component Analysis” (Goldsmith et al., 2015)
“Monte Carlo Simulation on the Stiefel Manifold via Polar Expansion” (Jauch et al., 2021)
“Functional principal component models for sparse and irregularly spaced data by Bayesian inference” (Ye, 2023)
“Bayesian Functional Principal Components Analysis via Variational Message Passing with Multilevel Extensions” (Nolan et al., 2023)
Substantial dimension reduction
FPC Functions: \(\text{dim}(\phi_k(t)) = \infty\)
FPC Vectors: \(\text{dim}(\phi_k(t)) >> 100\)
FPC Spline Coefficients: \(\text{dim}(\psi_k) \in [20, 50]\)
Choice of basis \(\mathbf{B}(t)\) is crucial
Restrict to well-behaved \(\phi_k(t)\)
Shapes the \(\psi_k\) space
Penalized spline priors1 controlling “wiggliness”
Add \(-h_k \int [\phi_k''(t)]^2 dt\) to log-likelihood
Unique smoothing parameters \(h_k\)
Model \(\Psi = [\psi_1 | \ldots | \psi_K]\) using parameter expansion2
Sample unconstrained \(\mathbf{X}, X_{i,j} \sim N(0, 1)\)
Take SVD \(\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma} \mathbf{V}^t\)
\(\mathbf{UV}^t\) is uniform on Stiefel manifold3
\[f(\psi_k|h_k) \propto \text{MVN}\left(\mathbf{0}, (h_k\mathbf{P})^{-1}\right) \times \mathbf{I}(\psi_k \text{ are orthonormal})\]
\(h_k\) smoothing parameters have Gamma\((\alpha, \beta)\) priors
Smoothing spline prior1 with additional orthonormality constraint
Proposition: The joint prior distribution on \(\psi_k, h_k\) is proper if and only if \(2\beta\) is greater than the first eigenvalue of the penalty matrix \(\mathbf{P}\)
Modeling Continuous Glucose Monitor (CGM) Data
A Bayesian Functional Model for CGM Data
Model Performance
Model Inference on CGM
Extending our approach
Real data and canonical examples
Similar/improved accuracy of point estimates
Nominal coverage of credible intervals
Simulations
\(50\) functions with \(500\) obs: \(\approx 35\) minutes
\(500\) functions with \(50\) obs: \(\approx 15\) minutes
Scales linearly in functions and obs.
Modeling Continuous Glucose Monitor (CGM) Data
A Bayesian Functional Model for CGM Data
Model Performance
Model Inference on CGM
Extending our approach
\[PPGR_{ij}(t) = \color{#619CFF}{\beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots)} + \color{#00BA38}{\sum_{k = 1}^K \xi_{ik} \phi_k(t)} + \color{#F8766D}{\epsilon_{ij}(t)}\]
\[PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots) + \sum_{k = 1}^K \xi_{ik} \phi_k(t) + \epsilon_{ij}(t)\]
\[PPGR_{ij}(t) = \beta_0(t) + DASH4D_{ij} \times \beta_1(t) + (\ldots) + \sum_{k = 1}^K \xi_{ik} \phi_k(t) + \epsilon_{ij}(t)\]
Fully-Bayesian: estimates \(+\) uncertainty for any quantity of interest
Accounts for all known sources of correlation and uncertainty
Computationally efficient and stable
Implemented in standard software (STAN)
We have developed an accessible R package
Modeling Continuous Glucose Monitor (CGM) Data
A Bayesian Functional Model for CGM Data
Model Performance
Model Inference on CGM
Extension to Multivariate, Sparse Data
Helicopactor pylori \(\Rightarrow\) child growth
May 2007 - Feb 2011 near Lima City, Peru
Longitudinal cohort of \(N = 197\) selected randomly from census
Length and weight measures2
Z-scored to age/gender WHO standards
Increasing sparsity of observation over time
Missing/cancelled visits
Can we (dynamically) infer growth trajectories?
\[\begin{pmatrix}W_{i}(t)\\ L_i(t) \end{pmatrix} = \begin{pmatrix}\mu^{(W)}(t; X_{i})\\ \mu^{(L)}(t; X_{i}) \end{pmatrix} + \sum_{k = 1}^K \xi_{ik} \begin{pmatrix}\phi^{(W)}_{k}(t)\\ \phi^{(L)}_k(t) \end{pmatrix} + \begin{pmatrix}\epsilon^{(W)}_i(t) \\ \epsilon^{(L)}_i(t) \end{pmatrix}\]
\(\mu^{(W)}(t; X_i), \mu^{(L)}(t; X_i)\): variate-specific means
\(\xi_{ik} \sim N(0, \lambda_k)\): independent scores
\(\Phi_k(t) = \begin{pmatrix} \phi^{(W)}_k(t) \\ \phi^{(L)}_k(t) \end{pmatrix}\): joint, data-driven FPCs
\(\epsilon^{(W)}_i(t), \epsilon^{(L)}_i(t)\): independent, normal errors
Adjustments:
Sparsity handled by splines
Concatenate the FPCs
Scale variates in pre-processing
More robust posterior FPC alignment
Simulations:
2-4 variables
Compared to available software1
\(25\)-\(50\%\) error reduction
Accurate estimates
Competitive computation
Nominal coverage
Learn \(\phi_k^{(p)}(t)\), \(\mu^{(p)}(t; X_i)\)
New scores \(\xi_{ik}\) are conditionally Gaussian
Fit local linear mixed effects models
Smooth along the functional domain
Point-wise confidence bands using the smoothing operator
Joint confidence bands using analytic procedure or bootstrap
Moving the needle for fasting glucose?
Project \(Y_i(t) = \mathbf{B}(t) \eta_i\), only model \(\eta_i\)
Choose \(\mathbf{B}(t)\) to be vector orthonormal (matrix \(\mathbf{B}\))
Computations should not scale with observations per function
Preliminary Results:
\[g\bigl(\mathbb{E}[Y_i(t)]\bigr) = \mu(t; X_{i}) + \sum_{k = 1}^K \xi_{ik}\phi_k(t)\]